May Prairie State Natural Area (Site 28).

Background

Data configuration

We have species composition data from 1031 1m^2 quadrats representing 27 sites.

Nearly all statistical packages require the data to be in a presence-absence form. There are several ways to do it (one of which can maintain cover values rather than changing it to binary data); I used a loop function. The result is a presence-absence matrix with Site as a column so subsamples can be organized accordingly. This column must be deleted for every analysis, however!

Table 1. Sample of data in presence-absence format.
Site SCICYP PROPEC JUNDEB JUNREP CXJOOR CXLOUI
1.1.1 1 1 1 1 1 1 0
1.1.2 1 1 1 1 0 0 1
1.1.3 1 1 1 0 0 0 1
1.1.4 1 1 1 0 0 1 1
1.1.5 1 0 0 0 0 0 1

Rarefaction and extrapolation of richness estimates and Hill numbers

Hill numbers are a family of diversity indices that combine relative abundance and species richness in various ways. The most common ones are Shannon and Simpson diversity indices, though species richness itself is also technically a Hill number!

Using the iNEXT pacagke

Each site was split into its own dataframe, then had that pesky “site” column removed. We then need to unite them all into a list (object). We then must use the function “as.incfreq”, which is part of the iNEXT package, to convert the data into a special incidence frequency list that the package expects.

#Make a list of all site dataframes
site.list.all = list(S01=S01,S02=S02,S03=S03,S04=S04,S05=S05,S06=S06,S07=S07,S08=S08,S09=S09,S10=S10,S11=S11,S12=S12,S13=S13,S14=S14,S16=S16,S17=S17,S18=S18,S19=S19,S20=S20,S21=S21,S22=S22,S23=S23,S24=S24,S25=S25,S26=S26,S27=S27,S28=S28)
#Convert everything in list to incidence frequencies
site.list.freq.all = lapply(site.list.all, as.incfreq)

Finally we use the iNEXT command to work its magic and create lots of curves based on resampling the data over and over in random order. If we change the q argument to 1 or 2, the output will return curves for Shannon and Simpson Diversity Indices respectively. We want richness, so we’ll leave it at 0. Nboot is the number of resample permutations (bootstraps); 10,000 is a lot. Note that the output must be plotted with iNEXT’s special system based on ggplot (creatively named ggiNEXT); most syntax is the same.

out.inc.all<-iNEXT(site.list.freq.all, q=0, datatype="incidence_freq", nboot=10000)

The big, muddled picture

ggiNEXT(out.inc.all, type=1, color.var="site") + 
  theme_bw(base_size = 18) +
  ylab("Species Richness") +
  xlab("Number of Quadrats") +
  ggtitle("Projectile unicorn vomit") +
  labs(caption="B=10000.")


This plot is not particularly helpful other than to visualize the general span of observed and expected richnesses and sampling efforts.

Looking closer

We’ll split up the plots to visualize the curves in more manageable doses. Note that sites were plotted for spatial efficiency rather than any logical order, and that each plot uses different scales.

Beyond curves

Other than using it to make species accumulation curves, we can use the iNEXT object to extract richness and diversity values. This object also outputs standard error as well as lower and upper estimates for each value (these were ommitted in the table below for brevity).

Table 2. Observed and expected Hill numbers from a portion of the sites.
Site RichnessObs RichnessExp ShannonObs ShannonExp SimpsonObs SimpsonExp
1 S01 33 38.387 20.533 22.985 15.633 16.636
6 S06 22 32.407 11.445 12.431 8.769 8.957
7 S07 68 102.908 52.888 72.166 43.263 54.541
8 S08 91 119.520 67.075 82.634 52.805 60.756
10 S10 51 153.307 20.113 25.348 10.620 10.853
26 S27 24 24.391 16.522 17.776 11.469 12.061
27 S28 132 187.467 88.698 106.193 66.864 73.784

Let’s visually interpret how the observed richness compares with the expected richness.

Oh. Yikes. Some of those look pretty rough. Let’s compare completion percentage (observed/expected) with sampling effort (# quadrats sampled/site radius) to see if some sites were unexpectedly low due to obvious undersampling.

There is no relationship between sampling effort and completion percentage (p=0.6366006). However, note that Sites 10 and 26 were flagged as outliers by the autoplot function. This inadequate sampling is likely the result of too few quadrats sampled at the wetland edge relative to the size of the wetland.

Moving Forward

There appears to be a relationship between site richness and site area, but unexpectedly this relationship appears to be negative. Because the data are likely non-linear, a generalized linear model should be used to assess this relationship.